
***
*This version created June 4, 2018
*final update/annotation June 24, 2021
*PURPOSE: merge the spending survey data (quarterly durables spending and related variables) with the expectations survey data (inflation expectations, unemployment expectations, etc)
*brief description of merge method: as a first choice, the "current" expectations are those reported in the first month of the spending quarter; if none such are available we choose expectations
*dated up as early as the second month of the previous quarter; if none such are available the spending observation is considered "unmatched" and is dropped
*lagged expectations are also created, and in such a way as to avoid overlap between "current" and "lagged" expectations 
*deflated durables individually and then added up, also retained nominal durables spending and "old" version of deflated durables
********************** Quarterly ****************

clear
clear matrix
set more off
set scheme s1color
estimates clear
graph drop _all
set matsize 2500
log close _all

** Set Directory
cd "../Do"

** Locals
*list of goods for nondurables spending aggregate

local spendcats "electricity water heatingfuel phonecable housecleaningproducts housecleaningservice gardenproducts gardenservice clothing personalcare drugs healthcareservices medsupplies entertainment hobbies personalservices otherchildspending foodhome foodout gasoline"

*list for durables aggregate
local spendcats_long "pricefridge priceoven pricedwasher pricewasher pricetv pricecomputer furniture"

local expectations "d_inflmedian d_inflquant1 d_inflquant2 q7 q6 q2a q40 d_wagemedian d_wagequant1 d_wagequant2 d_longinflquant2 d_longinflquant1 d_longinflmedian"

local housing "mortgage rent"

local bought "boughtcar boughtfridge boughtoven boughtwasher boughtdwasher boughttv boughtcomputer"

*quarterly-surveyed spending categories not deemed durable goods

local q_extra "homeins proptax carins carmaint healthins trips homerepair homerepair_services gifts charity downpmt_car amtstocks"
*loading the RAND-ALP spending survey data
use ../Data/cons_panel2.dta, clear

drop month year
*drop month year 
*date refers to date of survey completion
sort prim_key date
*month is month that the date belongs to
gen surv_month = mofd(date)
format surv_month %tm

sort prim_key surv_month


sort surv_month 
*generates last month in the spending quarter for durables, or actual spending month for nondurables (or spending month for durables later in the sample)  
gen spend_month = surv_month-1
format spend_month %tm
*identify first day of month prior to survey completion, in order to then identify the quarter
gen spend_month_day=dofm(spend_month)
format spend_month_day %td
*identify the spending quarter based on 1st day of last month of the quarter (could have done it any number of ways) 
gen quarter = qofd(spend_month_day)
format quarter %tq

*gen obs = 1
sort prim_key spend_month

gen quarter_firstday=dofq(quarter)
format quarter_firstday %td



/* this shows that after november 2011, durables are collected each month.
sort month
collapse (sum) `spendcats_long' `bought', by(month)
browse month `spendcats_long' `bought'
ak
*/

*command below looks like it could be incorrect, but instead it does the following: the "bought" variables are only nonmissing if someone purchased the item; instead of coding purchase as 1 for each item
*the survey coded positive purchase event using a different positive whole number for each item; this makes each indicator into a dummy variable, but for completeness one could recode missing outcomes to zero
foreach v in `bought' {
	replace `v' = 1 if `v' !=. 
}

*fixed here to correct inconsistencies between bought indicators and price indicators--letting price indicators override purchase indicators 


gen boughtfurniture = 1 if furniture>0 & furniture!=.
replace boughtfridge=1 if pricefridge>0 & pricefridge!=.
replace boughttv=1 if pricetv>0 & pricetv!=.
replace boughtoven=1 if priceoven>0 & priceoven!=.
replace boughtwasher=1 if pricewasher>0 & pricewasher!=.
replace boughtdwasher=1 if pricedwasher>0 & pricedwasher!=. 
replace boughtcomputer=1 if pricecomputer>0 & pricecomputer!=.


*browse furniture_count furniture prim_key month

*for all dur vars, this does not take into account # of each, or relative value
*this introduces a problem when including boughtcar in list of items to count; later on the dummy for any durables purchase was based on this being positive, so included car purchases*
*there is an easy fix in later do files or use new code below

*old code
*egen durcount = rowtotal(`bought' boughtfurniture)
*new code Nov 2020
egen durcount= rowtotal(boughtcomputer boughtwasher boughtdwasher boughtoven boughttv boughtfridge boughtfurniture)

*count

gen obs = 1
sort prim_key spend_month
*this collapse command (MB) takes care of issue that later in the sample (starting October 2011) people reported durables spending at monthly frequency rather than quarterly
*payments for car and mortgage and rent were ALWAYS reported at monthly frequency, so this step turns those into quarterly payments amounts 
*this also means that durables counts are summed within the quarter, and this affects period when durables reported monthly--that should still be OK , althought the "bought" variables should be dummies
*variable "amtmort" is mortgage balance (or payment) and this is asked monthly, so this gives within-quarter average--this is a problem if some are missing, but outcomes are highly robust to using first nonmissing value instead 
 
collapse (sum) obs `spendcats_long' `bought' boughtfurniture durcount car mortgage rent education (firstnm) howner retacct mort stocks (mean) amtmort carins, by(prim_key quarter)
sort prim_key quarter


local spendcats_long "pricefridge priceoven pricedwasher pricewasher pricetv pricecomputer furniture"

gen quarter_firstday=dofq(quarter)
format quarter_firstday %td
*this has to do with when they start asking about durables on a monthly basis
*retains only cases in which durables spending is observed in all 3 months of the quarter, so that quarterly spending amounts are comparable to those when surveyed only at quarterly frequency
drop if quarter_firstday >= mdy(10,1,2011) & obs!=3
*first month of spending quarter 
gen quarter_firstmonth = mofd(quarter_firstday) 
format quarter_firstmonth %tm


egen durables = rowtotal(`spendcats_long')

*MB June 2018 inflation adjustment: note that this is an appliances CPI, even though furniture is included in the durables spending total  
*MB: all outcomes robust to using different deflation methods or just using nominal values 
sort prim_key quarter
merge m:1 quarter using ../Data/app_cpi.dta
tab _merge
drop if _merge!=3
drop _merge

*Arman's old deflation method
*deflating aggregate (total durables spending), and will save individual purchases for by-category deflation and analysis, same result for durables
*replace durables = durables/app_cpi_q
*instead of replacing nominal durables, i'm generating a real durables variable, so that we can use both
gen durables_real1=durables/app_cpi_q

* individual durable goods categories
egen appliance_sum = rowtotal(pricefridge priceoven pricewasher pricedwasher)

sort prim_key quarter
merge m:1 quarter using ../Data/durables_cpi.dta
tab _merge
drop if _merge!=3
drop _merge

replace appliance_sum = appliance_sum / appliance_cpi
replace pricefridge = pricefridge/appliance_cpi
replace priceoven = priceoven/appliance_cpi
replace pricewasher = pricewasher/washer_cpi
replace pricedwasher = pricedwasher/appliance_cpi
replace pricetv = pricetv/tv_cpi
replace pricecomputer = pricecomputer/computer_cpi
replace furniture = furniture/furniture_cpi

*at this point each individal item has been deflated by the closest deflator possible; we then add the individual items to get a deflated sum of durables spending

*generating new measure of real durables spending: sum of individually deflated items 
egen durables_real2=rowtotal(pricefridge priceoven pricewasher pricedwasher pricetv pricecomputer furniture)
*now "durables" refers to nominal total, "..real1" refers to Arman's deflation method, and "...real2" refers to my new deflation method
*now "month" will refer to first month of the spending quarter for durables; 
rename quarter_firstmonth month
tempfile durables_spending2
save `durables_spending2' 


*now using expectations surveys to pull expectations from first month in spending quarter, then in each of the preceding two months
*now generate the `monthly_exp' file

*following dataset contains just expectations data (not also spending) 
use ../Data/complete_panel_alldates.dta, clear

sort unique survey 
*merging wage expectations: density medians, other info about the fitted distribution 
merge 1:1 unique survey using ../Data/wagedensity_data_long.dta , gen(wage_merge)
tab wage_merge

*merging one-year-ahead inflation expectations: density medians and other info about the fitted distribution 
sort unique survey 
merge 1:1 unique survey using ../Data/infl_density_data_long.dta, gen(infl_merge)
tab infl_merge

*merging 2-3 year ahead inflation expectations (analogous to above)
sort unique survey
merge 1:1 unique survey using ../Data/longinfl_density_data_long.dta, gen(longinfl_merge)
tab longinfl_merge

*MB June 1 2018: if we don't impose nonmissing on d_wagemedian we will lose fewer observations; we would gain back only a very small number if we don't require d_longinflmedian nonmissing; 
drop if d_inflmedian==. | d_wagemedian==. | d_longinflmedian==.


**generating real wage expectation as nominal wage density median minus inflation density median
gen rw_expect = d_wagemedian - d_inflmedian
*generating nominal wage growth uncertainty as interquartile range of nominal wage expectation distribution
gen d_wageiqr = d_wagequant2 - d_wagequant1
*generating inflation uncertainty as interquartile range of one-year-ahead inflation expectation distribution 
gen d_infliqr = d_inflquant2 - d_inflquant1
*generating long-run inflation uncertainty (analogous to above)
gen d_longinfliqr = d_longinflquant2 - d_longinflquant1


drop month year
*date refers to date of expectations survey
gen exp_month = mofd(date)
format exp_month %tm
*don't need to generate an expectations quarter because merging will occur by month (matching to first month in the quarter); durables data has its own quarter variable
*gen exp_quarter = qofd(date)
*format exp_quarter %tq

*in the expectations survey, date refers to survey completion, same as expectations date
gen expectations_date = date
format expectations_date %td

*
preserve
*MB June 1 2018: this is done because some people fill out two expectations surveys in the same month
bysort prim_key exp_month: egen mindate = min(date)
format mindate %td
*MB June 1 2018: keeping only the earliest-dated expectations survey completed within a given month 
keep if date==mindate
drop if mindate==.
drop mindate
*in the first instance, for matching with spending (where "month" refers to first month of spending quarter), choosing expectations from that same month (letting the matched month be same as first month of spending quarter)
gen month=exp_month
format month %tm
sort prim_key month
*unique is the panel identifier
xtset unique month
*generate lagged expectations: xtset data on month, then define lag as either the 3rd-month lag of IE (will be from first month of lagged quarter) up through the 12-month lag
*this is done so that lagged expectations don't overlap with current expectations--which might be drawn from second or third month of the lagged quarter


gen lag_IE=L3.d_inflmedian 
forvalues i=4/12 {
replace lag_IE=L`i'.d_inflmedian if lag_IE==.
}
*generate lagged inflation uncertainty: same as above 
gen lag_infl_iqr=L3.d_infliqr 
forvalues i=4/12 {
replace lag_infl_iqr=L`i'.d_infliqr if lag_infl_iqr==.
}
*this is the concurrent monthly expectations (M1)
tempfile monthly_exp
save `monthly_exp'

restore
**

*generating M-1 data (expectations from previous month): now keeping only latest-dated expectations in a given month, because these are already lagged relative to start of the spending quarter
bysort prim_key exp_month: egen maxdate = max(date)
format maxdate %td
*keeping only latest-dated expectations survey within a month
keep if date==maxdate
drop if maxdate==.
drop maxdate


*MB June 1 2018: pushing these expectations up to the following month, to later assign them as the lagged expectations for that month . These would be the expectations from the previous month that were closest in time to the start of that month (if there were multiple expectations in the previous month) 
*MB June 1 2018: if there is only 1 set of expectations in the month, then they are assigned as the current expectations for the given month, and assigned as the lagged expectations for the month after
preserve
gen month=exp_month+1
format month %tm
*MB June 1 2018: can assign quarter using floor of month/3, because month is "months since a given date" and quarter is "quarters since a given date", and there will be one-third as many quarters as months...up to an integer rounding
*gen quarter = floor(month/3)
*format quarter %tq
*want to sort on month rather than exp_month because that is how the merging will happen


*need to merge by month to guarantee that the expectations were dated either in the first month of spending quarter or in one of the previous two months--using quarter could yield faulty results

sort prim_key month
*"unique" is panel identifier
xtset unique month
*generate lagged expectations: xtset data on month, then define lag as either the 2nd lag of IE (will be from first month of lagged quarter) up through the 11th lag--which corresponds to twelve months
*before the first month of the spending quarter (this is because in this case the "current" expectations are actually drawn from the month just before the spending quarter)
*this is done so that lagged expectations don't overlap with current expectations--which in this case are drawn from the 3rd month of the quarter and in others from 2nd month of the quarter
gen lag_IE=L2.d_inflmedian 
forvalues i=3/11 {
replace lag_IE=L`i'.d_inflmedian if lag_IE==.
}
*generate lagged inflation uncertainty: same as above 
gen lag_infl_iqr=L2.d_infliqr 
forvalues i=3/11 {
replace lag_infl_iqr=L`i'.d_infliqr if lag_infl_iqr==.
}
tempfile monthly_exp_Mneg1
save `monthly_exp_Mneg1'

restore

*generating M-2 data
*pushes the month up by 1 again--so that expectations from late in a given month are assigned as the twice-lagged expectations for two months in the future 
gen month  = exp_month+2
*gen quarter = floor(month/3)
*format quarter %tq
sort prim_key month
*"unique" is panel identifier
xtset unique month
*generate lagged expectations: xtset data on month, then define lag as either the 1st lag of IE (will be from first month of lagged quarter) up through the 10th lag--which corresponds to twelve months
*before the first month of the spending quarter (this is because in this case the "current" expectations are actually drawn from the second month of the lagged spending quarter)
*this is done so that lagged expectations don't overlap with current expectations--which in this case are drawn from 2nd month of the lagged quarter
gen lag_IE=L1.d_inflmedian 
forvalues i=2/10 {
replace lag_IE=L`i'.d_inflmedian if lag_IE==.
}
*generate lagged inflation uncertainty: same as above 
gen lag_infl_iqr=L1.d_infliqr 
forvalues i=2/10 {
replace lag_infl_iqr=L`i'.d_infliqr if lag_infl_iqr==.
}
*generate lagged expectations: xtset data on month, then define lag as either the 1st lag of ie (will be from first month of lagged quarter) up through the 10th lag
tempfile monthly_exp_Mneg2
save `monthly_exp_Mneg2'


use `durables_spending2' , clear

**** Merging in expectations to durables spending data

*first merges in expectations formed in the first month of the spending quarter (called "month" in the durables dataset)--the earliest ones available if there are two in the same month
merge 1:1 prim_key month using `monthly_exp', gen(merge_status) 

preserve
*this variable indicates how many months lagged the matched expectations are relative to the first month in the spending quarter (0 indicates they are from the same month)
gen month_match = 0
keep if merge_status==3
*this tempfile saves just the observations that matched with an expectation for the same month--where month should refer to first month of the spending quarter 
tempfile quarterly_matches_1
save `quarterly_matches_1'

*now restore original durables spending data, after merge but before dropping non-merged items
restore
*keep things that did not have a match in the expectations data set
keep if merge_status==1
drop merge_status

*now merging expectations formed in the once-lagged month (latest available from that month if there are two)
merge 1:1 prim_key month using `monthly_exp_Mneg1', gen(merge_status) update
*note that hundreds had missing values updated
preserve
*this variable indicates how many months lagged the matched expectations are relative to the first month in the spending quarter
gen month_match = -1
keep if merge_status>2 & merge_status!=. 
*this tempfile saves just the observations that matched with an expectation for the previous month;  
tempfile quarterly_matches_Mneg1
save `quarterly_matches_Mneg1'
*now restore durables spending data, after most recent merge but before dropping non-merged items
restore
*keeping non-merged observations
keep if merge_status==1
drop merge_status

*now merging expectations formed in the twice-lagged month (latest avail for that month) 
merge 1:1 prim_key month using `monthly_exp_Mneg2', gen(merge_status) update

*this variable indicates how many months lagged the matched expectations are relative to the first month in the spending quarter
gen month_match=-2
*now keeping things that were matched with expectations from 2 months ago
keep if merge_status>2 & merge_status!=. 
append using `quarterly_matches_1'
append using `quarterly_matches_Mneg1'
drop merge_status



count if d_inflmedian!=.

*note that in the merged dataset the "tsend" date refers to completion of the spending survey--expectations survey date information retained as "exp_date" etc. This renders matching more transparent in data

save ../Data/matched_data_durables_Jun2018_baseline.dta, replace
// save ../Data/matched_data_durables_Jun2021_baseline.dta, replace
*in the matched dataset, "quarter" refers to the spending quarter, whereas other variables such as "month" refer to the expectations month or some transformation of it--"exp_month" is the actual expectations month
*above was the process of matching durables spending with expectations formed either in the first month of the spending uarter, or in either of the two previous months, along with appropriately
*defined lagged expectations
